Yukon
MONSTER: Monash Scalable Time Series Evaluation Repository
Dempster, Angus, Foumani, Navid Mohammadi, Tan, Chang Wei, Miller, Lynn, Mishra, Amish, Salehi, Mahsa, Pelletier, Charlotte, Schmidt, Daniel F., Webb, Geoffrey I.
We introduce Monster--the MONash Scalable Time Series E valuation R epository--a collection of large datasets for time series classification. The field of time series classification has benefitted from common benchmarks set by the UCR and UEA time series classification repositories. However, the datasets in these benchmarks are small, with median sizes of 217 and 255 examples, respectively. In consequence they favour a narrow subspace of models that are optimised to achieve low classification error on a wide variety of smaller datasets, that is, models that minimise variance, and give little weight to computational issues such as scalability. Our hope is to diversify the field by introducing benchmarks using larger datasets. We believe that there is enormous potential for new progress in the field by engaging with the theoretical and practical challenges of learning effectively from larger quantities of data.
Elaborative Subtopic Query Reformulation for Broad and Indirect Queries in Travel Destination Recommendation
Wen, Qianfeng, Liu, Yifan, Zhang, Joshua, Saad, George, Korikov, Anton, Sambale, Yury, Sanner, Scott
In Query-driven Travel Recommender Systems (RSs), it is crucial to understand the user intent behind challenging natural language (NL) destination queries such as the broadly worded "youth-friendly activities" or the indirect description "a high school graduation trip". Such queries are challenging due to the wide scope and subtlety of potential user intents that confound the ability of retrieval methods to infer relevant destinations from available textual descriptions such as WikiVoyage. While query reformulation (QR) has proven effective in enhancing retrieval by addressing user intent, existing QR methods tend to focus only on expanding the range of potentially matching query subtopics (breadth) or elaborating on the potential meaning of a query (depth), but not both. In this paper, we introduce Elaborative Subtopic Query Reformulation (EQR), a large language model-based QR method that combines both breadth and depth by generating potential query subtopics with information-rich elaborations. We also release TravelDest, a novel dataset for query-driven travel destination RSs. Experiments on TravelDest show that EQR achieves significant improvements in recall and precision over existing state-of-the-art QR methods.
Scalable mixed-domain Gaussian process modeling and model reduction for longitudinal data
Timonen, Juho, Lรคhdesmรคki, Harri
Gaussian process (GP) models that combine both categorical and continuous input variables have found use in longitudinal data analysis of and computer experiments. However, standard inference for these models has the typical cubic scaling, and common scalable approximation schemes for GPs cannot be applied since the covariance function is non-continuous. In this work, we derive a basis function approximation scheme for mixed-domain covariance functions, which scales linearly with respect to the number of observations and total number of basis functions. The proposed approach is naturally applicable to also Bayesian GP regression with discrete observation models. We demonstrate the scalability of the approach and compare model reduction techniques for additive GP models in a longitudinal data context. We confirm that we can approximate the exact GP model accurately in a fraction of the runtime compared to fitting the corresponding exact model. In addition, we demonstrate a scalable model reduction workflow for obtaining smaller and more interpretable models when dealing with a large number of candidate predictors.
LLM Processes: Numerical Predictive Distributions Conditioned on Natural Language
Requeima, James, Bronskill, John, Choi, Dami, Turner, Richard E., Duvenaud, David
Machine learning practitioners often face significant challenges in formally integrating their prior knowledge and beliefs into predictive models, limiting the potential for nuanced and context-aware analyses. Moreover, the expertise needed to integrate this prior knowledge into probabilistic modeling typically limits the application of these models to specialists. Our goal is to build a regression model that can process numerical data and make probabilistic predictions at arbitrary locations, guided by natural language text which describes a user's prior knowledge. Large Language Models (LLMs) provide a useful starting point for designing such a tool since they 1) provide an interface where users can incorporate expert insights in natural language and 2) provide an opportunity for leveraging latent problem-relevant knowledge encoded in LLMs that users may not have themselves. We start by exploring strategies for eliciting explicit, coherent numerical predictive distributions from LLMs. We examine these joint predictive distributions, which we call LLM Processes, over arbitrarily-many quantities in settings such as forecasting, multi-dimensional regression, black-box optimization, and image modeling. We investigate the practical details of prompting to elicit coherent predictive distributions, and demonstrate their effectiveness at regression. Finally, we demonstrate the ability to usefully incorporate text into numerical predictions, improving predictive performance and giving quantitative structure that reflects qualitative descriptions. This lets us begin to explore the rich, grounded hypothesis space that LLMs implicitly encode.
Balloons, 'objects' โ what's in the sky above the US?
Los Angeles, California โ The United States military shot down a flurry of objects this month: a large object it identified as a Chinese surveillance balloon followed by three smaller objects that the government said might be "benign". The airborne objects were drifting through airspace increasingly crowded with commercial and amateur balloons, drones and possible aerial surveillance craft belonging to adversaries. Their rising numbers pose a challenge to aviators and government agencies. Experts say that while heavy commercial balloons must meet strict Federal Aviation Administration (FAA) regulations, lighter amateur balloons are exempt from most rules, and the FAA might not be able to track them. Military and intelligence officials found no evidence that the three smaller objects were conducting surveillance for another country, and they were not sending communication signals, National Security Council spokesman John Kirby said at a White House briefing on Monday.
The top 10 weird and wonderful scientific discoveries of 2022
From a pig heart being successfully transplanted into a human, to being able to redirect an asteroid on a collision course with Earth, there have been all manner of weird and wonderful scientific discoveries in 2022. They include the human genome finally been mapped after two decades, the unearthing of Africa's oldest known dinosaur, and the release of the first ever image of a supermassive black hole at the heart of our Milky Way galaxy. There was also the alarming discovery that microplastics are everywhere โ including in us โ and the hugely-anticipated first images from the world's most powerful space telescope James Webb, which will peer back to the dawn of the universe. Here, MailOnline looks at 10 of the most interesting advances this year. The year began with a bang scientifically when just a week into it a dying man became the first patient in the world to get a heart transplant from a genetically-modified pig.
Deep Learning Models for River Classification at Sub-Meter Resolutions from Multispectral and Panchromatic Commercial Satellite Imagery
Moortgat, Joachim, Li, Ziwei, Durand, Michael, Howat, Ian, Yadav, Bidhyananda, Dai, Chunli
Remote sensing of the Earth's surface water is critical in a wide range of environmental studies, from evaluating the societal impacts of seasonal droughts and floods to the large-scale implications of climate change. Consequently, a large literature exists on the classification of water from satellite imagery. Yet, previous methods have been limited by 1) the spatial resolution of public satellite imagery, 2) classification schemes that operate at the pixel level, and 3) the need for multiple spectral bands. We advance the state-of-the-art by 1) using commercial imagery with panchromatic and multispectral resolutions of 30 cm and 1.2 m, respectively, 2) developing multiple fully convolutional neural networks (FCN) that can learn the morphological features of water bodies in addition to their spectral properties, and 3) FCN that can classify water even from panchromatic imagery. This study focuses on rivers in the Arctic, using images from the Quickbird, WorldView, and GeoEye satellites. Because no training data are available at such high resolutions, we construct those manually. First, we use the RGB, and NIR bands of the 8-band multispectral sensors. Those trained models all achieve excellent precision and recall over 90% on validation data, aided by on-the-fly preprocessing of the training data specific to satellite imagery. In a novel approach, we then use results from the multispectral model to generate training data for FCN that only require panchromatic imagery, of which considerably more is available. Despite the smaller feature space, these models still achieve a precision and recall of over 85%. We provide our open-source codes and trained model parameters to the remote sensing community, which paves the way to a wide range of environmental hydrology applications at vastly superior accuracies and 2 orders of magnitude higher spatial resolution than previously possible.
Photographer captures highest resolution shots of snowflakes ever
A renowned photographer has captured the highest resolution shots of snowflakes ever using a homemade prototype described as one part microscope and one part camera. Nathan Myhrvold, an American scientist, inventor, photographer and ex-chief technology officer of Microsoft, took 18 months to build the 100 megapixel camera capable of capturing a snowflake's microscopic detail. Using the camera, which he describes as the'highest resolution snowflake camera in the world', he took 100 frames of each snowflake in quick succession then stacked them for the whole image to be in focus. The results show the lush variety of snowflakes measuring only a few tens of millimetres in diameter, captured when Myhrvold was in Alaska and Canada. Pictured, stellar dendrite captured in Yellowknife, Canada.